-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache snapshotting performance improvements #7830
Conversation
This simplifies the cache.Snapshot func to swap the hot cache to the snapshot cache instead of copy and appending entries. This reduces the amount of time the cache is write locked which should reduce cache contention for the read only code paths.
The CacheKeyIterator (used for snapshot compactions), iterated over each key and serially encoded the values for that key as the TSM file is written. With many series, this can be slow and will only use 1 CPU core even if more are available. This changes it so that the key space is split amongst a number of goroutines that start encoding all keys in parallel to improve throughput.
The memory stats as well as the size of the cache were not accurate. There was also a problem where the cache size would be increased optimisitically, but if the cache size limit was hit, it would not be decreased. This would cause the cache size to grow without bounds with every failed write.
I ran into an issue where the cache snapshotting seemed to stop completely causing the cache to fill up and never recover. I believe this is due to the the Timer being reused incorrectly. Instead, use a Ticker that will fire more regularly and not require the resetting logic (which was wrong).
} | ||
|
||
func (c *cacheKeyIterator) encode() { | ||
concurrency := runtime.NumCPU() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be runtime.GOMAXPROCS(0)
instead of runtime.NumCPU
, to respect the GOMAXPROCS
environment setting?
As a side note, there's a few other places in the tsm1 package where we're using NumCPU
. I think those ought to all be GOMAXPROCS
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used NumCPU
because it sounds like GOMAXPROCS
might go away at some point: https://golang.org/pkg/runtime/#GOMAXPROCS. I also assumed that NumCPU
is the same as GOMAXPROCS(0)
. That might be incorrect though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like NumCPU
is different than GOMAXPROCS(0)
. Will update to use GOMAXPROCS(0)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If and when it does go away, gofix will be updated to minimize the effort to update our code.
The GOMAXPROCS variable limits the number of operating system threads that can execute user-level Go code simultaneously.
There may be situations where someone would want to set it lower, possibly in conjunction with taskset to deliberately limit how many CPUs influxdb can use. In other situations, we as developers may want to increase GOMAXPROCS to shake out contention issues.
wg.Done() | ||
}() | ||
} | ||
wg.Wait() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because this method is only called in its own goroutine, the WaitGroup seems unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Will update.
Required for all non-trivial PRs
This PR has has a few changes to improve the cache snapshotting throughput. Under heavy write load, snapshots can start to backup. This can cause 500 to be returned with a
cache-max-memory-size
error as well as more frequent write timeouts due to the cache being locked for a longer periods of time.The images below are examples of 1.1 cache and heap memory size under very heavy write load (~2M/s). They were run a c4.8xlarge (36 core/ 64 GB RAM) with both 100k series and 1M series.
The saw tooth pattern below is the snapshots getting progressively larger and taking longer to complete. At the peaks, write errors for
cache-max-memory-size
being exceeded could be returned depending on the configuration of the limit. When they get larger, timeouts also start to occur when compactions kick in and use up a CPU.Heap also continues to grow in this case:
With this PR, the same write load now looks like:
A longer 10B point run across 1M series looks similar whereas 1.1 would start hitting 500s fairly quickly.
In both cases, the cache size stayed well under the default 1GB max memory size.
The actual changes in this PR are the following:
Cache.Snapshot
was simplified to just swap the hot cache with the snapshot cache instead of iterating of very key and copy values. This reduces lock contention with the cache is write locked.CacheKeyIterator
encodes keys in parallel. Previously, the writing of a snapshot was single thread and would iterate over each key in order and encode all the values. This now encodes all the keys concurrently while the writer goroutine writes them out to disk in order as soon as the encoded blocks are ready. The makes snapshotting leverage all available cores now.Timer.C
channel that never fired. This code was reusing aTimer
incorrectly according to the docs so it has been switch to aTicker
instead.With these changes, the chances of getting a
cache-max-memory-size
limit error decrease significantly. Timeouts can still occur depending on the system load so increasingwrite-timeout
may still be needed in some cases. The timeout should occur much less frequently though.